NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Noise covariance estimation in multi-task high-dimensional linear models

https://doi.org/10.3150/23-BEJ1644

Tan, Kai; Romon, Gabriel; Bellec, Pierre C (August 2024, Bernoulli)

Full Text Available
Corrected generalized cross-validation for finite ensembles of penalized estimators

https://doi.org/10.1093/jrsssb/qkae092

Bellec, Pierre_C; Du, Jin-Hong; Koriyama, Takuya; Patil, Pratik; Tan, Kai (September 2024, Journal of the Royal Statistical Society Series B: Statistical Methodology)

Abstract Generalized cross-validation (GCV) is a widely used method for estimating the squared out-of-sample prediction risk that employs scalar degrees of freedom adjustment (in a multiplicative sense) to the squared training error. In this paper, we examine the consistency of GCV for estimating the prediction risk of arbitrary ensembles of penalized least-squares estimators. We show that GCV is inconsistent for any finite ensemble of size greater than one. Towards repairing this shortcoming, we identify a correction that involves an additional scalar correction (in an additive sense) based on degrees of freedom adjusted training errors from each ensemble component. The proposed estimator (termed CGCV) maintains the computational advantages of GCV and requires neither sample splitting, model refitting, or out-of-bag risk estimation. The estimator stems from a finer inspection of the ensemble risk decomposition and two intermediate risk estimators for the components in this decomposition. We provide a non-asymptotic analysis of the CGCV and the two intermediate risk estimators for ensembles of convex penalized estimators under Gaussian features and a linear response model. Furthermore, in the special case of ridge regression, we extend the analysis to general feature and response distributions using random matrix theory, which establishes model-free uniform consistency of CGCV.
more » « less
Estimating Generalization Performance Along the Trajectory of Proximal SGD in Robust Regression

Tan, Kai; Bellec, Pierre C (January 2024, Advances in neural information processing systems)

Full Text Available
Multinomial Logistic Regression: Asymptotic Normality on Null Covariates in High-Dimensions

Tan, Kai; Bellec, Pierre C (December 2023, Advances in neural information processing systems)

Full Text Available
MDPGT: Momentum-Based Decentralized Policy Gradient Tracking

https://doi.org/10.1609/aaai.v36i9.21169

Jiang, Zhanhong; Lee, Xian Yeow; Tan, Sin Yong; Tan, Kai Liang; Balu, Aditya; Lee, Young M; Hegde, Chinmay; Sarkar, Soumik (June 2022, Proceedings of the AAAI Conference on Artificial Intelligence)

We propose a novel policy gradient method for multi-agent reinforcement learning, which leverages two different variance-reduction techniques and does not require large batches over iterations. Specifically, we propose a momentum-based decentralized policy gradient tracking (MDPGT) where a new momentum-based variance reduction technique is used to approximate the local policy gradient surrogate with importance sampling, and an intermediate parameter is adopted to track two consecutive policy gradient surrogates. MDPGT provably achieves the best available sample complexity of O(N -1 e -3) for converging to an e-stationary point of the global average of N local performance functions (possibly nonconcave). This outperforms the state-of-the-art sample complexity in decentralized model-free reinforcement learning and when initialized with a single trajectory, the sample complexity matches those obtained by the existing decentralized policy gradient methods. We further validate the theoretical claim for the Gaussian policy function. When the required error tolerance e is small enough, MDPGT leads to a linear speed up, which has been previously established in decentralized stochastic optimization, but not for reinforcement learning. Lastly, we provide empirical results on a multi-agent reinforcement learning benchmark environment to support our theoretical findings.
more » « less
Full Text Available
Query-based targeted action-space adversarial policies on deep reinforcement learning agents

https://doi.org/10.1145/3450267.3450537

Lee, Xian Yeow; Esfandiari, Yasaman; Tan, Kai Liang; Sarkar, Soumik (May 2021, ICCPS '21: Proceedings of the ACM/IEEE 12th International Conference on Cyber-Physical Systems)
Robust Deep Reinforcement Learning for Traffic Signal Control

https://doi.org/10.1007/s42421-020-00029-6

Tan, Kai Liang; Sharma, Anuj; Sarkar, Soumik (December 2020, Journal of Big Data Analytics in Transportation)
null (Ed.)
Full Text Available
Probing biased activation of mu-opioid receptor by the biased agonist PZM21 using all atom molecular dynamics simulation

https://doi.org/10.1016/j.lfs.2021.119026

Liao, Siyan; Tan, Kai; Floyd, Cecilia; Bong, Daegun; Pino, Michael James; Wu, Chun (March 2021, Life Sciences)
null (Ed.)
Full Text Available
Spatiotemporally Constrained Action Space Attacks on Deep Reinforcement Learning Agents

https://doi.org/10.1609/aaai.v34i04.5887

Lee, Xian Yeow; Ghadai, Sambit; Tan, Kai Liang; Hegde, Chinmay; Sarkar, Soumik (June 2020, Proceedings of the AAAI Conference on Artificial Intelligence)
null (Ed.)
Robustness of Deep Reinforcement Learning (DRL) algorithms towards adversarial attacks in real world applications such as those deployed in cyber-physical systems (CPS) are of increasing concern. Numerous studies have investigated the mechanisms of attacks on the RL agent's state space. Nonetheless, attacks on the RL agent's action space (corresponding to actuators in engineering systems) are equally perverse, but such attacks are relatively less studied in the ML literature. In this work, we first frame the problem as an optimization problem of minimizing the cumulative reward of an RL agent with decoupled constraints as the budget of attack. We propose the white-box Myopic Action Space (MAS) attack algorithm that distributes the attacks across the action space dimensions. Next, we reformulate the optimization problem above with the same objective function, but with a temporally coupled constraint on the attack budget to take into account the approximated dynamics of the agent. This leads to the white-box Look-ahead Action Space (LAS) attack algorithm that distributes the attacks across the action and temporal dimensions. Our results showed that using the same amount of resources, the LAS attack deteriorates the agent's performance significantly more than the MAS attack. This reveals the possibility that with limited resource, an adversary can utilize the agent's dynamics to malevolently craft attacks that causes the agent to fail. Additionally, we leverage these attack strategies as a possible tool to gain insights on the potential vulnerabilities of DRL agents.
more » « less
Full Text Available
Deep Reinforcement Learning for Adaptive Traffic Signal Control

https://doi.org/10.1115/DSCC2019-9076

Tan, Kai Liang; Poddar, Subhadipto; Sarkar, Soumik; Sharma, Anuj (October 2019, ASME Dynamic Systems and Control Conference)

Abstract Many existing traffic signal controllers are either simple adaptive controllers based on sensors placed around traffic intersections, or optimized by traffic engineers on a fixed schedule. Optimizing traffic controllers is time consuming and usually require experienced traffic engineers. Recent research has demonstrated the potential of using deep reinforcement learning (DRL) in this context. However, most of the studies do not consider realistic settings that could seamlessly transition into deployment. In this paper, we propose a DRL-based adaptive traffic signal control framework that explicitly considers realistic traffic scenarios, sensors, and physical constraints. In this framework, we also propose a novel reward function that shows significantly improved traffic performance compared to the typical baseline pre-timed and fully-actuated traffic signals controllers. The framework is implemented and validated on a simulation platform emulating real-life traffic scenarios and sensor data streams.
more » « less

« Prev Next »

Search for: All records